Creating Lexical Resources for Endangered Languages
نویسندگان
چکیده
This paper examines approaches to generate lexical resources for endangered languages. Our algorithms construct bilingual dictionaries and multilingual thesauruses using public Wordnets and a machine translator (MT). Since our work relies on only one bilingual dictionary between an endangered language and an “intermediate helper” language, it is applicable to languages that lack many existing resources.
منابع مشابه
Indigenous Languages of Indonesia: Creating Language Resources for Language Preservation
In this paper, we report a survey of language resources in Indonesia, primarily of indigenous languages. We look at the official Indonesian language (Bahasa Indonesia) and 726 regional languages of Indonesia (Bahasa Nusantara) and list all the available lexical resources (LRs) that we can gathered. This paper suggests that the smaller regional languages may remain relatively unstudied, and unkn...
متن کاملAutomatically Creating Multilingual Lexical Resources
The thesis proposes creating bilingual dictionaries and Wordnets for languages without many lexical resources using resources of resource-rich languages. Our work will have the advantage of creating lexical resources, reducing time and cost and at the same time improving the quality of resources created.
متن کاملTime to change the “ D ” in “ DEL ”
The “D” in “DEL” stands for “documenting” – a code word for linguists that means the collection of linguistic data in audio and written form. The DEL (Documenting Endangered Languages) program run by the NSF and NEH is thus centered around building and archiving data resources for endangered languages. This paper is an argument for extending the ‘D’ to include “describing” languages in terms of...
متن کاملLERIL: Collaborative Effort for Creating Lexical Resources
The paper reports on efforts taken to create lexical resources pertaining to Indian languages, using the collaborative model. The lexical resources being developed are: (1) transfer lexicon and grammar from English to several Indian languages, and (2) dependency tree bank of annotated corpora for several Indian languages. The dependency trees are based on the Paninian model. (3) is an attempt t...
متن کاملTowards a Common Conceptual Framework of Language Documentation
Language represents shared conventionalization of concepts by all speakers. Hence language documentation preserves information far beyond a collection of sound shapes, lexical forms, and grammatical structures. The preservation of linguistically conventionalized conceptual structure is even more crucial for endangered language since this information is very often not available elsewhere. Howeve...
متن کامل